Two-pattern strings II - frequency of occurrence and substring complexity

نویسندگان

  • Frantisek Franek
  • Jiandong Jiang
  • William F. Smyth
چکیده

The two previous papers in this series introduced a class of infinite binary strings, called two-pattern strings, that constitute a significant generalization of, and include, the much-studied Sturmian strings. The class of two-pattern strings is a union of a sequence of increasing (with respect to inclusion) subclasses TPλ of two-pattern strings of scope λ, λ = 1, 2, · · · . Prefixes of two-pattern strings are interesting from the algorithmic point of view (their recognition, generation, and computation of repetitions and near-repetitions) and since they include prefixes of the Fibonnaci and the Sturmian strings, they merit investigation of how many finite two-pattern strings of a given size there are among all binary strings of the same length. In this paper we first consider the frequency fλ(n) of occurrence of two-pattern strings of length n and scope λ among all strings of length n on {a, b}: we show that limn→∞ fλ(n) = 0, but that for strings of lengths n ≤ 2λ, two-pattern strings of scope λ constitute more than one-quarter of all strings. Since the class of Sturmian strings is a subset of two-pattern ∗also at School of Computing, Curtin University, Perth WA 6845, Australia, and Department of Computer Science, King’s College London.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Template Discovery Algorithm by Substring Amplification

In this paper, we consider to find a set of substrings common to given strings. We define this problem as the template discovery problem which is, given a set of strings generated by some fixed but unknown pattern, to find the constant parts of the pattern. A pattern is a string over constant and variable symbols. It generates strings by replacing variables into constant strings. We assume that...

متن کامل

A Linear-Space Algorithm for the Substring Constrained Alignment Problem

In a string similarity metric adopting affine gap penalties, we propose a quadratic-time, linear-space algorithm for the following constrained string alignment problem. The input of the problem is a pair of strings to be aligned and a pattern given as a string. Let an occurrence of the pattern in a string be a minimal substring of the string that is most similar to the pattern. Then, the output...

متن کامل

Finding Characteristic Substrings from Compressed Texts

Text mining from large scaled data is of great importance in computer science. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeating substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel p...

متن کامل

Abelian pattern matching in strings

Abelian pattern matching is a new class of pattern matching problems. In abelian patterns, the order of the characters in the substrings does not matter, e.g. the strings abbc and babc represent the same abelian pattern a+2b+c. Therefore, unlike classical pattern matching, we do not look for an exact (ordered) occurrence of a substring, rather the aim here is to find any permutation of a given ...

متن کامل

Two-Pattern Strings — Computing Repetitions & Near-Repetitions

In a recent paper we introduced infinite two-pattern strings on the alphabet {a, b} as a generalization of Sturmian strings, and we posed three questions about them: • Given a finite string x, can we in linear time O(|x|) recognize whether or not x is a prefix/substring of some infinite two-pattern string? • If recognized as two-pattern, can all the repetitions in x be computed in linear time? ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • J. Discrete Algorithms

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2007